A PIM (Processor-In-Memory) for Computer Graphics : Data Partitioning and Placement Schemes

نویسندگان

  • Jae Chul Cha
  • Sandeep K. Gupta
چکیده

The demand for higher performance graphics continues to grow because of the incessant desire towards realism. And, rapid advances in fabrication technology have enabled us to build several processor cores on a single die. Hence, it is important to develop single chip parallel architectures for such data-intensive applications. In this paper, we propose an efficient PIM architectures tailored for computer graphics which requires a large number of memory accesses. We then address the two important tasks necessary for maximally exploiting the parallelism provided by the architecture, namely, partitioning and placement of graphic data, which affect respectively load balances and communication costs. Under the constraints of uniform partitioning, we develop approaches for optimal partitioning and placement, which significantly reduce search space. We also present heuristics for identifying near-optimal placement, since the search space for placement is impractically large despite our optimization. We then demonstrate the effectiveness of our partitioning and placement approaches via analysis of example scenes; simulation results show considerable search space reductions, and our heuristics for placement performs close to optimal – the average ratio of communication overheads between our heuristics and the optimal was 1.05. Our uniform partitioning showed average load-balance ratio of 1.47 for geometry processing and 1.44 for rasterization, which is reasonable. Keywords—Data Partitioning and Placement, Graphics, PIM, Search Space Reduction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient PIM Architecture for Computer Graphics

An Efficient PIM Architecture for Computer Graphics Jae Chul Cha and Sandeep K. Gupta {jaecha, sandeep}@poisson.usc.edu Abstract Rapid advance of manufacturing technology has enabled us to build higher performance graphic processors with much smaller area. And, based on the current trends, we can predict that System-on-Chip (SOC) with the substantial number of graphic processors will emerge in ...

متن کامل

Efficient Data Placement for Processor-in-memory Array Processors

As one of the point design teams for the PetaFlop supercomputer project sponsored by NSF, NASA, etc., we propose the study of the PIM (Processor-In-Memory) massive parallel architecture. To efficiently execute an application on the PIM array processors, a good data partitioning, which minimizes the interprocessor communication, is required. Default partitioning such as row-wise or columnwise ma...

متن کامل

Eecient Data Placement for Processor-in-memory Array Processors

As one of the point design teams for the PetaFlop supercomputer project sponsored by NSF, NASA, etc., we propose the study of the PIM (Processor-In-Memory) massive parallel architecture. To eeciently execute an application on the PIM array processors , a good data partitioning, which minimizes the interprocessor communication, is required. Default partitioning such as row-wise or column-wise ma...

متن کامل

EFFICIENT DATA PLACEMENT FOR PROCESSOR-IN-MEMORY ARRAY PROCESSORSy

As one of the point design teams for the PetaFlop supercomputer project sponsored by NSF, NASA, etc., we propose the study of the PIM (Processor-In-Memory) massive parallel architecture. To efficiently execute an application on the PIM array processors, a good data partitioning, which minimizes the interprocessor communication, is required. Default partitioning such as row-wise or columnwise ma...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008